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Abstract — We consider the problem of compressing memoryless 
binary data with or without side information at the decoder. We 
review the parity- and the syndrome-based approaches and discuss 
their theoretical limits, assuming that there exists a virtual binary 
symmetric channel between the source and the side information, 
and that the source is not necessarily uniformly distributed. 
We take a factor-graph-based approach in order to devise how 
to take full advantage of the ready-available iterative decoding 
procedures when turbo codes are employed, in both a parity- or 
a syndrome-based fashion. We end up obtaining a unified decoder 
formulation that holds both for error-free and for error-prone 
encoder-to-decoder transmission over generic channels. To support 
the theoretical results, the different compression systems analyzed 
in the paper are also experimentally tested. They are compared 
against several different approaches proposed in literature and 
shown to be competitive in a variety of cases. 

Index Terms — Source coding, Slepian-Wolf coding, message- 
passing, syndrome-based binning, parallel concatenated turbo 
codes. 



I. Introduction 

THE POSSIBILITY of employing turbo codes for data 
compression has been exploited after that the relation of 
channel coding with distributed source coding was made clear 
The concept of distributed source coding (DSC) applies to 
the scenario where many correlated sources must be com- 
pressed without allowing the respective encoders, that send 
their compressed outputs to a common decoder, communicate 
with each other In their landmark paper, [1 J, Slepian and Wolf 
extended to this scenario the well-known result of Shannon for 
a single information sequence, namely R > H{X) for faithful 
reproduction |2|, [H, and showed that in terms of aggregate 
rate of the compressed representation there are no losses with 
respect to the traditional case where the encoders communicate 
with each other In practice, the amount of power used for inter- 
encoder communications can be saved while achieving the same 
compression performance. In [4|, Wyner and Ziv investigated 
instead the rate-distortion function in the DSC-related scenario 
of source coding with side information (SCSI) at the decoder, 
where a source must be compressed as usual but the decoder can 
also rely on some correlated side information for reconstruction 
within a given distortion limit. Again, if perfect reconstruction 
is desired, they showed that there are no losses with respect to 
the case where the encoder can access the side information too. 
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Since the results in fTl and (4] are obtained with a non- 
constructive random binning approach, it took about thirty years 
for practical SCSI systems to appear. As foreseen by Wyner in 
his 1974 paper ||5], in order to achieve the theoretical limits all 
of them are based on concepts that are rather typical of channel 
coding, such as syndromes or parities. 

In particular, the very first practical SCSI system appeared 
in f6\ (see also fT], fSl). In this system, the syndrome relative 
to a trellis code \9\ is computed at the encoder in order to 
signal the coset to which the current (quantized) source outcome 
belongs. Then, the decoder reconstructs the data relying as well 
on the side information. Similarly, in the case of near-lossless 
binary data compression with binary side information, many 
authors applied the syndrome-based approach of [51 relying 
on low-density parity check (LDPC) lilOJ or turbo 111] codes. 
For example, syndromes relative to LDPC codes are used in 
lfT2l . ifTsl . while syndromes relative to turbo codes are used 
in m, Ida, ED, (113, (US- WWle in the case of LDPC 
codes the syndrome formation is straightforward (due to the fact 
that LDPC codes are exactly defined by means of their parity- 
check matrix), turbo-code-syndrome formation is less direct. In 
[14], in addition to the principal trellis employed in traditional 
channel coding, complementary trellises are used for syndrome 
formation and decoding (as in jU). A specific parity-check 
matrix is instead employed for syndrome formation in ifTSl . lfT6l 
and fT7|. Decoding is performed by means of standard turbo 
decoding in ifTSl . ifTTl and using the so-called syndrome trellis 
in llT6l . Some of these approaches are not limited to the SCSI 
problem, but can be also applied to DSC, i.e. where no variable 
is exactly known at the decoder ifTJl . ifTSl . 

Formerly, rather than a syndrome-based approach, many 
binary SCSI-related works dealing in particular with turbo codes 
took a parity-based approach. Within the latter approach, the 
side information is simply seen as a "dirty" version of the source 
(possibly non-binary) that could be "channel-decoded" upon 
receiving some parity bits, formed by the encoder with respect 
to a systematic code. Even if the syndrome-based approach 
is provenly optimal while the parity-based one is not always 
so 1 19], satisfying results have been reported as well ll20l . 
|21|. In addition, however, in order to avoid the limitations of 
the parity-based approaches it is possible to design the parity 
formation procedure in an optimal way |22|. The parity -based 
approaches have at least two advantages over syndrome-based 
ones. First, error-prone encoder-to-decoder transmission over 
more realistic channels than the traditional binary symmetric 
channel (BSC) or binary erasure channel (BEC), over which 
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the parity bits become "dirty" (possibly non-binary) parities, 
can be easily handled. Second, puncturing can be immediately 
used for rate adaptation, and the resulting code is automatically 
incremental. These properties were hence effectively exploited 
for joint source-channel coding of a single information sequence 
II23I . 1241 . II25I or for DSC-based video compression with a 
feedback channel ||26l . 

In principle, it is possible to puncture turbo-syndrome bits too 
in order to achieve an incremental source code |16|, IZTll . Il28l . 
but if the parity-check matrix is not properly chosen erasures or 
flips to the syndrome bits can make the correct reconstruction 
of some elements of the source very hard and hence lead 
to a very poor performance ||28]| . If syndromes are computed 
based on LDPC codes, syndrome decoders can instead handle 
more successfully erased or "dirty" syndromes, since decoding 
based on message-passing algorithms on factor graphs |29 | can 
easily model this scenario. In general, techniques for syndrome 
protection against transmission losses can always be employed 
that possibly permit to exchange "soft" information with an 
actual syndrome decoder in order to maximize the performance 
II30I . II3TI . But, many convolutional and turbo syndrome de- 
coders expect a strictly binary syndrome as input (and a binary 
side information), as for example in fl^, fTSl, so that their 
performance cannot be properly optimized in case of non-binary 
syndrome transmission channels. 

In this paper, we consider the problem of turbo-code-based 
data compression, with or without side information at the 
decoder. In particular, we tackle both problems of turbo-parity 
and turbo-syndrome decoding from the point of view of a 
general maximum a posteriori probability problem. As soon 
as the probability function to be maximized is factorized into 
its building terms, it becomes immediately straightforward 
to understand how conventional and ready-available iterative 
decoding algorithms used for turbo decoding can be applied 
to the problem at hand. Under this novel perspective, it is 
no longer necessary to introduce modified trellises in order to 
perform decoding or to explicitly try to invert the syndrome 
formation procedure used during encoding, as done in |T4] and 
in [151 respectively. Differently from the other contributions on 
this subject, decoding is hence described in both parity- and 
syndrome-based approaches using the same factor-graph-based 
approach commonly taken in the LDPC-codes-related literature 
||29l . Consequently, both binary (BSC and BEC) and non-binary 
(e.g. additive white Gaussian noise) transmission channels are 
handled under a unified formulation; moreover non-binary side 
information is handled as well. A similar result is reported in 
|[T6|, but employing ad hoc encoding and decoding techniques 
on modified trellises. 

The rest of the paper is organized as follows. In Sectionllllwe 
review the practical approaches to the SCSI problem appeared 
so far, namely the parity- and the syndrome-based approaches, 
and investigate their theoretical performance when there exists 
a virtual BSC between the source and the side information. 
Section |lll] particularizes the parity- and the syndrome-based 
approaches to the case where turbo codes are employed, and 
shows how standard turbo encoding and decoding algorithms 
can be practically applied for data compression. In Section 
HVl we show the compression performance of the discussed 



algorithms in a variety of settings, and compare them against 
other results in the literature referring to both parity- and 
syndrome-based systems. Concluding remarks on this work are 
given in Section |V] 

II. Theoretical Limits and Connections to Channel 

Coding 

Let a data source emit independent and identically distributed 
(i.i.d.) realizations of a pair of correlated discrete-alphabet 
random variables (r.v.) (X, Y). From the source coding theorem 
|2), a sequence of n of these realizations can be encoded (with 
an arbitrarily small probability of decoding error) using on 
average R bit/realization iff i? > H{X, Y) and n is sufficiently 
large, where H{-, •) denotes the joint entropy |3|. 

Surprisingly, if the encoder was made by two independent 
components that cannot communicate with each other, one for 
encoding X and the other for encoding Y, then the lower bound 
for the total rate would be the same. In particular, a sequence 
of n joint realizations can be encoded (with an arbitrarily 
small probability of decoding error) using on average Rx and 
Ry bit/sample by the X- and by the F-component of the 
distributed encoder, respectively, iff Rx + Ry > H{X, Y), 
Rx > H{X\Y), Ry > H{Y\X), and n is sufficiently large, 
where H{-\-) denotes the conditional entropy yj. 

Consequently, in the problem of lossless SCSI, where X 
is encoded with Y being perfectly known at the decoder 
(i.e. Ry > H{Y)), the lower bound for the source coding rate 
Rs is Rs > H{X\Y) < H{X). In order to actually construct a 
coding system that reaches this limit, the lossless SCSI problem 
was first recast as a channel coding problem. The interpretation 
of X and Y as inputs or outputs of a virtual correlation channel 
(CC) is the key that led to this connection. 

The vast majority of the literature discusses the binary case 
and assumes that X and Y are connected by a BSC. More 
precisely, the side information Y is seen either as the output 
("forward" BSC model) or as the input ("backward" BSC 
model) of the BSC channel. In the first case there exist a r.v. 
independent from X such that Y = X ® Z^'/in the second one 
there exist a r.v. independent from Y such that X — Y<S)Z^^ 
However, it is often also assumed that the source X is uniformly 
distributed (u.d.). In both cases, this implies (i) that Y is u.d. as 
well, and (ii) that it does not actually matter if the BSC is seen 
in one or in the other direction (i.e. Z^ or Z^ turns out to be 
independent from both X and Y). 

In this paper, instead, we focus on the case where X can 
possibly be non-u.d., i.e. where the CC being a "forward" 
or "backward" BSC actually matters and leads to different 
considerations, from both the theoretical and the practical point 
of view. In the rest of this section, we will take the theoretical 
perspective, and discuss the implications of the two models with 
both the parity and the syndrome approaches. 

'in both cases it is assumed that p* = P[Z* = 1] satisfies < p* < 1/2, 
which is not restrictive. In fact, if p* = then X = Y , if p* = 1/2 then X 
and Y are actually independent, and if p* > 1/2 then the CC could be simply 
seen as a BSC with error probability 1 — p* < 1/2 followed by a deterministic 
symbol inversion. 
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A. Forward BSC Model 



According to this model, (i) the lower bound for compressing 
X is H{X\Y) = H{Y\X) - [H{Y) - H(X)] = H{Zf) - 
[H{Y)-H{X)], which satisfies H{X\Y) < H{Zf) (unless X 
is u.d.), and (ii) the unconstrained capacity of the virtual BSC 
is C-^ = 1 — H{Z^) < 1 bit/channel use. Since the capacity 
of a BSC can be approached by linear codes ll32l . and linear 
codes can always be generated by a systematic encoder, the first 
devised strategy for the lossless SCSI problem was the parity- 
based approach. 

1) Parity-based approach: First, a linear {n,k) code ap- 
proaching the capacity of the virtual BSC, i.e. such that it 
achieves an arbitrarily small probability of decoding error with 
i?c = — C*-^, is taken (in general, Rc < C^). Then, each 
successive sequence of k realizations from X is fed into the 
corresponding channel encoder, that computes n — k parity bits. 
These bits form the compressed representation and are sent to 
the channel decoder. In turn, the channel decoder "receives" as 
well the corresponding k realizations from Y and reconstructs 
the "transmitted" codeword, made by the k bits from the source 
(if no decoding errors have occurred) and by the n — k parity 
bits (which are simply discarded). 

The compression rate achieved by the parity-based approach 
is i?, = = 1(1 - i?J bit/samplfi Consequently, i?, > 
IH(Zf) > H{z'f) > H{X\Y), showing that this approach 
cannot achieve the theoretical bound, not even in the case of X 
being u.d. 

Nevertheless, in this scenario the channel decoder could 
actually "correct" losses in the parity message as well, which is 
more than what needed for losslessly decoding X. For example, 
as long as the parity message is received as if it was gone 
through a BSC with error-rate less or equal to the one of the 
virtual BSC, the decoder would still be able to reconstruct X 
with arbitrarily small probability of error. Hence, either we 
conclude that the approach is suboptimal, but somewhat robust 
against error-prone transmission of the parity message, or we 
deliberately employ lossy compression of the parity message at 
the encodeiH, achieving an actual rate reduction. 

The maximum rate reduction of this quantized parity-based 
approach is achieved when the error-rate of the "quantization" 
BSC is equal to the one of the virtual BSC. In this case, the 
rate is reduced by a factor 1 — H{Z^) = , leading to a 
compression rate Rf ^ Rs > H{Z^') > H{X\Y), where 
the first inequality is an equality iff the code achieves capacity 
and the second inequality is an equality iff X is u.d. Finally, 
we conclude that the quantized parity-based approach achieves 



^As a remark, note that even if X is not u.d. tlie parity message is i.i.d. and 
u.d., at least asymptotically with the codelength n, so that no further rate 
reduction is possible as long as the parity message is sent losslessly. 

^Since the parity message is i.i.d. and u.d., the effect of optimal lossy com- 
pression can be taken into account by assuming the existence of a "quantization" 
BSC between the true parity message and the one sent to the decoder (3]- 



the theoretical bound, but only in the case of X being u.d0 

Alternatively, the parity message could be formed with re- 
spect to an higher-rate ad-hoc code achieving the capacity 
of the true channel, over which the parity is known to be 
not harmed. The average capacity of this channel is C( = 
|[1 - H{Zf)] + 2^ = 1 - ^H[Zf ), so that the channel 
coding rate constraint could be relaxed to Rc < C( > , that 
in turn would lead to Rs > H{Z^ ) > H{X\Y), i.e. to the same 
conclusions obtained for the quantized parity-based approach. 
However, it turns out that this approach is rather a syndrome- 
based and not a parity-based one. In fact, the discussion about 
the syndrome-based approach in the following will show that 
the parity message with respect to an ad-hoc code could be 
simply seen as a syndrome message. 

2) Syndrome-based approach: Again, take a linear {n,k) 
code approaching the capacity of the virtual BSC. This code 
partitions the set of all sequences of n symbols from the input 
alphabet into 2""*^ cosets that are as good as the original code 
for channel coding. Then, in correspondence of each successive 
sequence of n realizations from X, the encoder identifies the 
coset to which that sequence belongs. This information is 
encoded as ?i — A; syndrome bits that form the compressed 
representation and are sent to the decoder. Upon decoding 
(in principle) the corresponding n realizations from Y (i.e. a 
corrupted codeword) into the signalled coset, the n bits from 
the source can be eventually reconstructed with an arbitrarily 
small probability of error. 

As noted in |fT9l , from an {n, k) linear code used in a 
syndrome-based SCSI system an ad-hoc (2n — fc, n) systematic 
encoder can be derived for an equivalent parity-based SCSI 
system. In fact, the n ~ k bits used to specify the coset 
information can be seen as parity bits. 

Differently from the parity-based approach, in the syndrome- 
based approach it is not necessary to employ quantization in 
order to achieve the compression limit. In fact, the compressed 
representation requires now Rg = = I — Rc bit/sampl^. 
Hence, Rs > H{Z^) (with equality iff the code achieves 
capacity), and in turn H{Z^) equals H{X\Y) iff X is u.d. If 
we let q denote the probability of X being one, the compression 
rate loss A = H{Z^) ~ H{X\Y) in correspondence of a fixed 
value of H{X\Y) is shown in Fig. [T] We conclude that the 
syndrome-based approach achieves the theoretical bound, but 
again only in the case of X being u.d. 

For example, the (3,1) Hamming code (i.e. the (3,1) rep- 
etition code) is seen as a linear code achieving the capacity 
of the additive channel (on GF{2'^)) in which Z^ is such 
that p/(000) = ^■'^(001) = ^■'^(010) = p/(100) = 1/4 
li6J, 1221; in fact, this code can correct these error patterns 

^In this paper, both for the forward and for the backward case, we always 
assume that the channel decoder is informed about (i) the exact encoding process 
(which code, parity or syndrome, quantization used or not, . . . ), (ii) the statistics 
of the transmission channel (TC) between encoder and decoder, and (iii) the 
statistics of the virtual BSC, but not necessarily about the statistics of X. If this 
statistics was known at the decoder, then the actual channel code (over which 
the decoder conducts its search) would be a subset of the linear code used for 
parity formation, and our conclusions would be incon'ect (namely, less parity 
bits could be sufficient for correct decoding). 

^Similarly to the parity message, and at least asymptotically with the 
codelength n, the syndrome message is i.i.d. and u.d. even if X is not u.d. 



4 



H{X|Y) 


= h(0.01) 


H{X|Y) 


= h(0.05) 


- - H(X|Y) 


= h(0.10) 



0.8 - 



0.7 - 



0.6 - 

a. 
E 




0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 

q 



Fig. 1. Compression rate loss under the forward BSC model when X is not 
u.d. (h{-) denotes the binary entropy function). 



and = 3 - H{Zi) = 1 bit/channel use. The 4 cosets 
of the partition are {000,111}, {001,110}, {010,101}, and 
{Oil, 100}. While it is usually stated that this syndrome-based 
system achieves H{X\Y), it should be emphasized that under 
a forward BSC model this is true iff X is u.d. As example, 
if ]3x(000) = pjf(OOl) = pjf(OlO) = pjf(Oll) = 1/16, 
and px(lll) = Px(llO) = Px(lOl) = Px(lOO) = 3/16, 
the information about the coset sent by the encoder requires 
Rs — H{Z^) = 2 bit/sample (and no less than this), but 
py(OOO) = pf(OOI) = py(OlO) = py(Oll) = 3/32 = 1.5/16 
and py(lll) = pi'(llO) = pf(IOI) = py(lOO) = 5/32 = 
2.5/16, i.e. Y has a strictly greater entropy than X. Hence, 
Rs = H{Zf) > H{X\Y).n 

B. Backward BSC Model 

According to this model, the lower bound for compressing X 
is simply H{X\Y) = H{Z^); the unconstrained capacity of the 
virtual BSC is C*" = 1 - H{Z'') < 1 bit/channel use. Despite 
the different correlation structure, which in case of X being not 
u.d. is really different, both the parity- and the syndrome-based 
encoding procedures (with respect to a linear code achieving 
the capacity of the virtual BSC) can be employed exactly as 
in the case of the forward BSC model. However, the decoding 
algorithm can be now related to the channel decoding strategy 
that would be used for reliable transmission over the virtual 
BSC only in the case of the syndrome-based approach. 

In particular, since the syndrome in correspondence of each 
successive sequence of n realizations from X can be generated 
by means of a linear function, and the decoder can similarly 
generate the syndrome of the corresponding sequences of n 
realizations from Y, the decoder can first compute the syndrome 
of the difference. Hence, it actually knows the coset into which 
the difference Z^ lies. But the code and its cosets are such 
that the typical errors across the virtual BSC can be corrected. 
Eventually, the actual difference (and the actual source) can 

'But, if the statistics of X was known at the decoder, then the actual 
code (over which the decoder conducts its search) would be a subset of the 
coset specified by the syndrome message. Our conclusions would be in this 
case incorrect (namely, cosets could be made "larger" and still achieve correct 
decoding, while requiring less bits for the con'esponding syndrome message). 



be found with arbitrarily small probability of error as the only 
typical element in that particular coset. 

The compression rate achieved by the syndrome-based ap- 
proach is again Rg = = 1 — i?c bit/sample. But, differently 
from the forward BSc'case, R, > H{Z^) = H{X\Y), with 
equality iff the code achieves capacity. Hence, we conclude 
in addition that the syndrome-based approach achieves the 
theoretical SCSI bound, independently from the actual statistics 
of X. 

Despite in the case of the parity-based approach the con- 
nection to the channel decoding strategy for the virtual BSC 
is less straightforward, this does not represent a problem. In 
fact, the optimal SCSI decoder, as discussed in the next section, 
can be readily derived as solution of a maximum a posteriori 
probability (MAP) problem. In addition, it turns out that the 
decoder can still re-use the typical channel decoding algorithms 
that would be used for reliable transmission over the virtual 
BSC, confirming once again that the SCSI problem is in fact a 
channel coding problem. 

1) Compression without Side Information: The SCSI prob- 
lem is obviously an extension of the simpler source coding 
problem (without side information). In particular, the source 
coding problem falls exactly into the backward BSC model. In 
fact, it is sufficient to assume that a "fake" side information Y 
exists, which is identically zero, and that Z^ = X. Hence, all 
conclusions about the SCSI algorithms applied to the backward 
BSC model hold for the simpler problem of source compression 
too. 

III. SCSI AND Data Compression Using Turbo Codes 

From the discussion above, it appears that the search for 
"good" SCSI systems reduces to the search for "good" channel 
codes. As turbo codes fTTl, ||32ll come very close to achieving 
the promise of Shannon's channel capacity theorem, many 
SCSI systems appeared in literature take advantage of their 
application. 

A. Turbo-Parity Formation 

The conventional (parallel concatenated) turbo encoder is 
a systematic encoder: in correspondence of a sequence of 
Nk realizations from X (x) it uses two systematic (n, k) 
convolutional codes to form two sequences of parity bits of 
N{n — k) + zt bits eaclQ (po and pi). The source bits enter 
the second convolutional encoder, that can also be equal to the 
first, after being randomly interleaved (reordered). 

The turbo code can hence be seen as a giant {N{2n — k) + 
2zt, Nk) systematic block code whose generator matrix is 

G = [ lATfe I Po I Pi ] , 

where P,; is the Nk x [N{n — k) + zt] parity formation matrix 
corresponding to the i-th convolutional code (comprehensive of 
the interleaver if i = 1). 

Before sending the parity to the decoder, puncturing (i.e. bit 
removal) can be employed for rate adaptation at the encoder. 

'The additional zt <^ N{n — k) bits are emitted for terminating the encoding 
into the zero state 1341 . 
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The encoder can hence operate at any rate < i?s < ; in 

particular, for appropriate choices of (n, fc), rates greater than 1 
bit/sample are possible too. If puncturing is employed, then the 
equivalent generator matrix is G' — [Ijvfc|Po|Pi], where P^ is 
the Nk X Si matrix obtained removing from the columns 
corresponding to the punctured parity bits. 



B. Turbo -Syndrome Formation 

From the generator matrix of the turbo code, the parity-check 
matrix 

H' 



p(xy) 



P^^ 






Pi^' 




1^1 



is immediately derived that can be used for syndrome formation. 
All other parity check matrices can be derived from H' by left- 
multiplication with an invertible (so + si) x (sq + si) matrix: 
in case of an error-prone TC it is possible that some of them 
lead to a better performance than H' ||28l , ||T6| . 

However, if H' is employed, in correspondence of a sequence 
of 7V/c+so+si outcomes from X (partitioned into the three sub- 
sequences X, Xq, and xi of length Nk, sq, and si respectively), 
the syndrome (i.e. the right-multiplication of [x|xo|xi] by H'-'") 
can be simply obtained by (i) forming the (punctured) parities 
Po and pi corresponding to xo, and (ii) adding pi with x^. 
The syndrome message [sq, Si] is hence made of two sequences 
Si = Pi©Xi of Si bits each that are directly obtained re-utilizing 
the conventional turbo encoder 

The encoder can operate at any rate < i?s < ^^"Ifc'' < 1 
bit/sample; in particular, no rates greater than 1 bit/sample are 
realizable: for example, if (2, 1) constituent codes are employed, 
then < < 2/3 bit/sample. 

C. Unified Decoding 

From the discussion in Section [III it may seem that the actual 
SCSI decoding procedures cannot be exactly the ones employed 
in channel decoding, and also must depend from the correlation 
model and from the particular message received at the decoder 
(parity or syndrome). In fact, many contributions on this subject 
proposed ad-hoc decoding techniques, tailored for the specific 
settings treated. However, as shown in [35 1, these techniques are 
actually in most cases the same and could be simply derived 
by tackling the problem as a MAP one. In the turbo case, this 
strategy immediately indicates how to re-utilize the conventional 
turbo decoder in the SCSI decoder. 

In the parity case, assuming that the parity messages p^ 
are received at the decoder as r^, and that y denotes the 
iV/e-dimensional realization of Y corresponding to the source 
realization x being compressed, the optimum MAP estimate is 
simply found a^ 

argmaxp(x|yrori) = arg maxpymn (x) . 

X X 

This probability can be obtained marginalizing the function 
Pyron (xpoPi) that, apart some scaling factors, factorizes into 

^ Given two r.v. A and B, the likelihood and a posteriori probability (APP) 
functions will hereafter be denoted by la{b) = p{o,\b) and Pa{b) = p{b\a) 
respectively, so that the "free" variable always appears as argument of the 
function while the known one always appears as subscript (parameter). 




p(ro|po 



P(ri|pi) 



(a) parity-based approach 



p(xiyi) 




(b) syndrome-based approach 

Fig. 2. Factor graphs for iterative decoding in the problem of SCSI using 
turbo codes. At each turbo iteration involving one of the two constituent codes 
(e.g. the one whose function to be marginalized is represented by the sub- 
graph in the dashed box), the incoming message across the box is seen as prior 
information about x. 



the product of (i) p(xy), (ii) Xi(Pi|x), that are indicator 
functions unitary iff pi is the actual parity of x (according to the 
i-th convolutional code, and comprehensive of the interleaver if 
i — 1), and (iii) lr-{pi), that take into account for the TC. As 
a remark, note that if y is seen as the systematic portion of 
the codeword "received" by the decoder, this maximization is 
exactly the one performed by an optimal MAP channel decoder, 
i.e. there is no difference between a turbo decoder and a SCSI 
decoder 

In practice, for turbo codes, an iterative procedure is used 
for approximating MAP decoding that is easily described as a 
message-passing algorithm on the /actor graph representing this 
factorization, shown in Fig. |2(a)| (for a useful tutorial article on 
factor graphs and message-passing algorithms, the reader is re- 
ferred to [2?]). The traditional decoding algorithm employs the 
forward-backward algorithm (also known as BCJR algorithm 
f36l) in order to exactly marginalize the function relative to one 
constituent code (represented by the sub-graph in the dashed 
box, which has no cycles) using the message incoming from 
the previous iteration (involving the other constituent code) as 
additional prior information about x, and produces a new prior 
for the next iteration. 

In the syndrome case, assuming that the syndrome messages 
Si are received at the decoder as r,, and that the sequence of 
Nk + So + si realizations from Y corresponding to the source 
realization being compressed is equivalently partitioned into the 
three sub-sequences y, yo, and yi, the optimum MAP estimate 
is found as 

arg max pyyoyirori(xxoxi) . 

XXqXi 

This probability can be now obtained marginalizing the function 
Pyyoyirori (xxqXiPoPiSoSi) that, apart some scaling factors. 
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factorizes into the product of (i) p(xy)p(xoyo)p(xiyi), (ii) 
Xi(pi|x), (iii) X{piexi=si}^ that are indicator functions of the 
condition in brackets, and (iv) lr-{si). The only difference 
w.r.t. the parity case is represented by the introduction of 
the factors in (iii). Most importantly, these factors do not 
add any cycle in the factor graph, as shown in Fig. |2(b)| 
Hence, decoding can be performed re-using the turbo decoding 
algorithm presented above. In particular, it is only necessary to 
form the correct input likelihoods to the parity nodes p, and then 
post-process their final APP approximation for obtaining the 
APP approximation for the source nodes X;. In the following, 
this decoding procedure is referred as soft-syndrome decoding 
(SSD), while the suboptimal SCSI decoding procedure resulting 
from employing a turbo decoder that communicates only the 
hard choices for p; (and not their full APP functions) is referred 
as hard-syndrome decoding (HSD). 

If the CC is a forward BSC, p(xy) = p(x)/y(x) = 
p{^)pzf{y ^ x); if the CC is a backward BSC, p(xy) cx 
Py(x) — pz''{^ — y) (the same holds for p(xiyi)). In the first 
case, MAP decoding is feasible only if p(x) is also known at the 
decoder Since we assume that the decoder is not aware of this 
information, we actually operate it without that factor (i.e. as if 
X was u.d.), performing what is known as maximum likelihood 
(ML) decoding. In both cases, the SCSI decoder re-utilizes in 
the best possible way the traditional turbo decoding algorithm, 
without needs for designing any particular parity/syndrome 
manipulation or inversion. 

D. Discussion 

The turbo-syndrome formation algorithm described in Section 
IIII-BI corresponds to the algorithm used in yj | and to the 
"zero-forcing" algorithm described in ifTSl and in other papers 
by the same authors. While we directly tackle MAP decoding 
of the turbo-syndrome, the front-end of the decoder used in 
ifTSl consists of a hard-in hard-out inverse syndrome former 
(ISF). With binary TC-output and side information, the straight 
ISF-based decoder implements exactly what we named hard- 
syndrome decoding. 

The factor-graph approach has certain advantages with re- 
spect to the utilization of an ISF. First, neither the TC-output 
nor the side information (in SCSI problems) are restricted to be 
binary in order to perform decoding. Then, in case of an error- 
prone TC (a BSC is for example tested in f28l), the optimum 
input likelihoods to the traditional turbo-decoding algorithm are 
immediately known without the need to analyze the signal flow 
through the ISF. Finally, syndromes which are not formed for 
both constituent codes according to the "zero-forcing" approach, 
for which the ISF is difficult, if not unfeasible, to construct, can 
be handled too. 

However, the "zero-forced" syndrome is not really robust 
against TC errors. In fact, while source bits belonging to x 
are effectively "protected" by the turbo-code, any source bit 
belonging to x^ participates only to a single check for syndrome 
formation, such that erasures or flips to a syndrome bit make 
the correct recovery of the corresponding source bit very hard. 
By using a polynomial parity-check matrix, as proposed in |28 | 
and lfT6l for convolutional and turbo codes respectively, more 



robust syndromes have been found that in particular support 
puncturing. But, not only the resulting encoder can no longer 
rely on traditional turbo-encoding algorithms, but also efficient 
decoding must be actually performed on a more complex treflis 
(named super trellis in [16|) that no longer shares the same 
transitions of the original trelhs. 

IV. Experiments and Comparisons 

Experiments have been done under both the backward and the 
forward BSC correlation models. In the latter case, we focused 
on the case of non-u.d. sources. We also compared our results 
with many others from the literature. 



A. Experimental Setup 

The same turbo code and the same data frame length L 
have been employed for both parity- and syndrome-based 
approaches. In particular, the turbo code uses two identical 
(n, k) = (2, 1), 16-state, systematic constituent encoders with 



1 D'^+D-'+D+l 



and a random 



generator matrix G(D) 

interleaver in between. Two different frame fengths have been 
considered, namely L = 2^'^ = 16384 samples ("short" frame) 
and L = 2^^ = 65536 samples ("long" frame). Random 
puncturing of the parity bits is employed for rate adaptation. 

All decoding routines are set for a (maximum) number of 
runs of the FBA algorithm equal to 40 (20 iterations for each 
code). However, in order to reduce the decoding complexity, 
a stopping criterion breaks the decoding task whenever both 
the constituent FBA algorithms indicate persistent and mutually 
consistent decoded codewords. As suggested in |,32J , during 
each FBA run the most probable transition at each time-step is 
evaluated in order to check if the sequence of all such transitions 
forms a valid codeword. In practice, we consider the last 4 
consecutive FBA runs and check if in all of them the same 
codeword is obtained. If this condition is met, the turbo loop for 
the current frame is stopped and the last computed likelihoods 
are emitted. 

Only error-free transmission channels had been considered in 
the simulations. For all choices of the simulation parameters, 
2^^ = 32768 or 2^^ = 8192 frames (in the short and in the long 
case, respectively) have been generated, such that the average 
bit error ratio (BER) is eventually estimated over 2^^ ~ 5 ■ 10^ 
samples. 



B. Backward BSC 

We fixed the value of p'' and measured the performance of 
the considered decoding algorithms as a function of the target 
coding rate Rg. Three cases were considered, namely p'' = 0.10, 
p'' ~ 0.05, and p'' = 0.01. The simulation results are reported 
in Fig. [3] Fig. |4] and Fig. |5] respectively. 

In all cases, the compression limit is given by H{Z^) = 
h{p''). This limit is independent from the source statistics. In 
fact, despite the curves shown refer to a uniform distribution, we 
also checked that the same exact results are obtained with any 
distribution. In the plots we show the theoretical limit in terms 
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PD (16384) 
HSD (16384) 
SSD (16384) 
PD (65536) 
HSD (65536) 
SSD (65536) 
W-Z bound 



0.44 0.46 0.4 



0.5 0.52 0.54 0.56 0.58 0.6 0.62 
rate [bit/sample] 



Fig. 3. Comparison between the different SCSI algorithms based on turbo 
decoding, for a backward BSC model with = 0.10 (and any source 
statistics). 




— — PD (16384) 
— HSD (16384) 

SSD (16384) 

PD (65536) 
-e— HSD (65536) 
SSD (65536) 
■ — - W-Z bound 




0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4 0.42 0.44 
rale [bit/sample] 



Fig. 4. Comparison between the different SCSI algorithms based on turbo 
decoding, for a backward BSC model with = 0.05 (and any source 
statistics). 



— — PD (16384) 
— HSD (16384) 

SSD (16384) 

PD (65536) 

HSD (65536) 
-a- SSD (65536) 
■ — - W-Z bound 




Fig. 5. Comparison between the different SCSI algorithms based on turbo 
decoding, for a backward BSC model with p^ = 0.01 (and any source 
statistics). 



of the rate-distortion function ("Wyner-Ziv bound") fTQ. The 
curve labelled "PD" refers to parity decoding, while the other 
ones refer to syndrome decoding, hard or soft. The length of 
the frames is given in parentheses. 

The SSD approach always presents a waterfall region closer 
to the Wyner-Ziv bound than the PD approach of the same 
length does. The gap between these curves tends to disappear 
when source and side information are very correlated (p^ 0). 
This seems to suggest that the parity-approach may have in this 
case a theoretical limitation similar to the one found in Section 
III-All for the forward BSC case. The factor n/k > 1 responsible 
for the gap would in fact be closer to one (i.e. no loss) when 
less parity bits are formed. 

Despite the waterfall region of SSD is closer to the Wyner-Ziv 
bound, the error floor associated to SSD is also higher than the 
one associated to PD, especially for the high correlation case. 
As the puncturing increases (i.e. as Rg decreases) both PD and 
SSD present higher error floor regions and more irregular BER 
curves, probably due to the heavy and unoptimized puncturing 
of the parity of both constituent codes. This behavior is much 
more visible in the syndrome-based approach than it is in the 
parity-based approach, as can be seen, in particular, in Fig. |5] 

In both approaches, a sharper waterfall curve and a better 
performance are obtained with long frame lengths rather than 
with short frame lengths. This fact is reasonable since large 
interleaver lengths are likely to generate more randomly dis- 
tributed codewords. 

The HSD approach has a performance in between the one 
of SSD and the one of PD, at least for low correlations, but 
its error floor is rather high akeady for p^ = 0.05. Instead, for 
high correlation (for example when p^ — 0.01), it is worth to 
note that HSD performs very poorly with respect to both SSD 
and PD. 

1) Comparison with Other Systems: The results obtained 
for the backward BSC correlation model hold for any source 
distribution. They can hence be compared also with results that 
refer to a forward BSC model, at least as long as a uniform 
source distribution is assumed in the latter case. 

In Table U this comparison is given in terms of rate required 
for near-lossless compression. For systems based on channel 
codes, where a residual error is always expected, a BER < 10~^ 
is considered as the threshold for almost perfect reconstruction. 
The rates reported in the Table consider the case p^ — 0.10 or 

= 0.05, and are divided in two sections, the first for parity- 
based methods and the second for syndrome-based ones. In 
both sections, the methods are sorted according to their average 
performance under the two working conditions. 

As brief comment to these results, we highlight the fact 
that in the first section of the Table (parity-based approaches) 
the "short" PD method performs only slightly worse than the 
"Turbo parity" method of the same frame length proposed in 
ll23l . which in turn is outperformed only by the "long" PD 
method. For what concerns the syndrome-based approaches, the 
method "P&C trellis" |14| is placed between the "short" and 

'Despite this function is derived for uniform source and side distribution, 
we use it also in the non-uniform cases. Note that, however, for BER— > 
this function converges to H(X\Y), which is the lossless compression limit 
independently from the source statistics. 
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TABLE I 

Comparison between different SCSI methods : the minimum 

IS SHOWED, FORp'' 



AND p" 



.05. In PARENTHESES, the GAP FROM THE THEORETICAL LIMIT 



is showed. the frame length is reported too. in both parts of the 
Table (referring to parity- and syndrome-based approaches 
respectively) the various methods are sorted by increasing 
performance. 



p" 


.10 


.05 


H{Z>') = h{p^) 


.469 


.286 


bzip2 (16384) 1231 


.670 (-f .201) 


.440 (+.154) 


Turbo parity, 8-state (16384) |20| 


.630 (-f.l61) 


.435 (+.149) 


Turbo parity (10000) (25) 


.590 (+.121) 


.440 (+.154) 


gzip (16384) (23] 


.600 (+.131) 


.410 (+.124) 


PD, 16-state (16384) 


.600 (+.131) 


.394 (+.108) 


Turbo parity, 8-state (16384) (23) 


.580 (+.111) 


.380 (+.094) 


PD, 16-state (65536) 


.576 (+.107) 


.374 (+.088) 


LDPC (16384) 112J 


.600 (+.131) 


.402 (+.116) 


SSD, 16-state (16384) 


.549 (+.080) 


.398 (+.112) 


P&C trellis, 8-state (16384) O 


.556 (+.087) 


.388 (+.102) 


SSD, 16-state (65536) 


.528 (+.059) 


.359 (+.073) 



PD (65536) 
SSD (65536) 
SF+ISF (30000) 
Syn. trellis (300000) 
W-Z bound 



0.52 0.54 0.56 0.58 



0.62 0.64 



H(Z°) = h(p°) 

Fig. 6. Comparison between different SCSI methods, at fixed rate Rs =2/3 
bit/sample. The label "SF-l-ISF" refers to the syndrome-based method in |15 | 
(results for two different convolutional codes are shown); the label "Syn. trellis" 
refers to the syndrome-based method in |16|, where 16-state constituent codes 
are employed. The frame length is reported in parentheses. 



the "long" SSD methods. Even though these comparisons can 
be considered a little unfair since systems are based on different 
convolutional codes with different number of states, and on 
frames of different sizes, these results have been reported in 
order to give an idea on how the considered decoding techniques 
behave with respect to other systems known in literature. 

A more fair comparison is indeed given in Fig. |6] in which 
the BER as a function of h{p^) is shown at rate — 2/3 
bit/sample. In this Figure, it can be seen that the proposed 
"long" SSD method outperforms the coding performance of the 
"SF+ISF" method given in lITSl . In fact, as observed in Section 
IIII-DI these two methods are very similar but the latter is rather 
based on the sub-optimal HSD algorithm. Despite the very large 
interleaver length, the "Syn. trellis" method proposed in 1 16 1 has 
instead a very poor performance, which is even worse than the 
performance of the "long" PD method. 

Finally, Fig. |7] shows some results relative to Rs = 1/2 



- PD (65536) 
-Turbo partly (1 00000)1^ 

- SSD (65536) 

- LDPC (100000) 

- PSC trellis (100000) 
-Syn. trellis (200000) 

- W-Z bound 



0.32 0.34 0.36 0.38 



0.4 



0.42 0.44 
H(Z'') = h(p'') 



0.46 0.48 



Fig. 7. Comparison between different SCSI methods, at fixed rate Rg = 1/2 
bit/sample. The label "Turbo parity" refers to the parity-based method in 1211 . 
that uses two (5, 4) 16-state constituent codes. The label "LDPC" refers to 
the syndrome-based method in |12| (results relative to two irregular LDPC 
codes are shown); the label "P&C trellis" refers to the syndrome-based method 
in 1 14 1 that uses 16-state constituent codes; the label "Syn. trellis" refers to 
the syndrome-based method in L16J (16-state). The frame length is reported in 
parentheses. 



bit/sample. In this case the proposed "long" SSD method has 
again a good performance, which is overcome only by the 
LDPC -based systems reported in lfT2l (which employ a longer 
frame size) and by the "P&C trellis" method proposed in 
lfT4l . which makes use of longer frames and of different 16- 
state constituent codes (specifically tailored for heavy data 
puncturing). Again, despite its error-resilience properties and 
the very long frame size, the "Syn. trellis" method |16| has 
very poor performance. 

C. Forward BSC model 

In the forward BSC scenario, we focused on the case where 
the source is not u.d. (if it was u.d., then the model would be 
equivalent to the backward one analyzed above). In particular, 
we considered either that the probability of the source being 
one is g = 0.15 or that it is g = 0.20. As in the backward 
case, the decoder is not informed about this. In both cases, we 
assigned values of in order to obtain a target H{X\Y). In 
particular, the targets that we employed are ft,(0.10), /i(0.05), 
and /i(O.Ol), so that the expected optimal compression rates 
(but we know already that we will not operate at optimality) are 
equal to the ones expected in the previous section, relative to 
the backward BSC model. The experimental results are shown 
in Fig. [H Fig. |9] and Fig. [TO] respectively for these three targets. 

The plots compare the performance of the PD method, of the 
SSD method and of the quantized parity-based approach (QPD 
method), and all are relative to a "long" frame. In the latter 
case, the lossy parity quantization is actually simulated by (i) 
adding to the parity bits a random binary noise having the same 
statistics of , (ii) assuming the rate is reduced by a factor 
1 — H{Z^). Note that, since at most the PD method permits to 
operate at i?^ = 2 bit/sample, the maximum rate of the QPD 
method decreases to [1 — H{Z^)]Rs, and it may be possible 
that no waterfall behavior could be seen, not even operating at 
this maximum rate. This is the reason why, for q — 0.15 and 
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Fig. 8. Comparison between the different SCSI algoritlims based on turbo 
decoding, for a forward BSC model witli fixed H{X\Y) = /i(O.lO), and 
q = 0.15 or g = 0.20. 




rate [bit/sample] 

Fig. 9. Comparison between tlie different SCSI algoritlims based on turbo 
decoding, for a forward BSC model with fixed H{X\Y) = /i(0.05), and 
q = 0.15 or g = 0.20. 
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Fig. 10. Comparison between the different SCSI algorithms based on turbo 
decoding, for a forward BSC model with fixed H{X\Y) = /i(O.Ol), and 
q = 0.15. 



H{X\Y) = /i(O.lO) the QPD curve is not shown; similarly, 
since the SSD method is limited to i?^ = 2/3 bit/sample, under 
the same settings no waterfall behavior could be seen, so the 
corresponding curve was not shown. 

The theoretical limit found in section UTaI namely H{Zf) > 
H{X\Y), is also shown in the plots. Since in the case of 
H{X\Y) = /i(O.Ol) the theoretical losses for q = 0.15 and 
q = 0.20 are about the same (see Fig. [T]i, only the former 
case was investigated. The curve relative to the SSD method 
applied to a backward correlation model with same conditional 
entropy (compression limit) and same source distribution is 
shown too, in order to emphasize how different could be the 
experimental performances under different correlation models 
which may appear to be the same. 

Similarly to the backward BSC case, we noticed that the 
SSD method is always better than the PD one, but we noticed 
also that the former has higher error-floor, especially for high 
correlation. However, the SSD method is always far from 
achieving the Slepian-Wolf bound, and also somewhat more far 
than expected from the theoretical limit H{Z-^). Instead, if the 
model would be "backward" with same parameters, SSD would 
operate far way closer to the compression bound. 

The differences between the q = 0.15 and the q = 0.20 
case are very noticeable, suggesting that the performance should 
increase rapidly when approaching uniformity (see Fig. [T]). 

The QPD method, that in theory should be better than the 
PD method and operate under the same bound of the SSD 
method, did not provide the expected results. In particular, it 
improved the PD performance only for H{X\Y) = /i(O.lO), 
while it always degraded with respect to the PD performance 
in the other cases. In practice, even if this possibility was not 
tested, it could be possible that QPD improves with respect to 
PD if operated with a slightly higher rate (i.e. with less strong 
parity quantization). 

V. Conclusion 

In this paper, we reviewed the parity- and the syndrome- 
based approaches to the source coding problem with or without 
side information at the decoder We discussed their theoretical 
hmits, in particular in the case of a non-uniformly distributed 
source. Also, we recast the problem of decoding parities or 
syndromes formed with respect to turbo codes into a general 
maximum a posteriori probability problem. By using a factor- 
graph approach, we immediately devised how to take full 
advantage of the conventional iterative decoding algorithms tra- 
ditionally employed in channel coding problems. We eventually 
used a unified perspective on the data reconstruction problem, 
that permits to deal straightforwardly with non-binary side in- 
formation and with non-binary encoder-to-decoder transmission 
channels too. 

Finally, we analyzed the performance of many different 
compression systems. The performance comparisons clearly 
show the differences between the parity- and the syndrome- 
based approaches, that are not usually discussed in the literature, 
in a variety of settings. Our experiments confirm the limits 
found in the theoretical analysis. The performance comparison 
with several other state-of-the art coding systems appeared in 
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literature validates the practical utilization of the presented 
coding methods. 
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